Affan Hamid

Transferring data from CSV files to SQL databases is a common task in data management. In this blog post, we’ll show you how to use Python to convert CSV data into an SQL database. Using libraries like pandas and sqlite3, you can easily automate this process, making your data workflow more efficient. Follow along to learn the essential steps and code snippets needed to seamlessly migrate your data from CSV to SQL.

We’ll be using the DC character dataset provided on kaggle here.

You don’t need to install sqlite3 with Python, it comes pre-packaged with versions of Python later than 2.5.

We will just import it along with pandas (You would have to install Pandas in PIP):

import sqlite3
import pandas as pd

Connecting to the SQLite3 database

If you’re using a mac, you don’t need to install SQLite on your system, and If you do not already have sqlite3, you’ll have to install . But don’t worry, the installation process is quite simple.

You don’t actually need to create the database. When you use the sqlite3.connect() function, it will automatically create database for you if it doesn’t exist in the folder.

And you will see the name of your database and its location as well. Now we can connect to the database with Python with a simple command:

conn = sqlite3.connect('characters.db') # Connecting to the database
cursor = conn.cursor() # Object to run queries

Creating the table

To create an SQLite table, we’ll need to know all the columns before hand. Pandas can help us with this. Load in the database and run the info function

df = pd.read_csv('dc-wikia-data.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6896 entries, 0 to 6895
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   page_id           6896 non-null   int64  
 1   name              6896 non-null   object 
 2   urlslug           6896 non-null   object 
 3   ID                4883 non-null   object 
 4   ALIGN             6295 non-null   object 
 5   EYE               3268 non-null   object 
 6   HAIR              4622 non-null   object 
 7   SEX               6771 non-null   object 
 8   GSM               64 non-null     object 
 9   ALIVE             6893 non-null   object 
 10  APPEARANCES       6541 non-null   float64
 11  FIRST APPEARANCE  6827 non-null   object 
 12  YEAR              6827 non-null   float64
dtypes: float64(2), int64(1), object(10)
memory usage: 700.5+ KB

We’ll need to add each of these columns to our query. For now, we’ll just assume all of them are strings.

Since the FIRST APPEARANCE column has a space in it, we’ll need to replace it with something else, like an underscore.

table_name = 'characters'

columns_with_types = ", ".join([f"{col.replace(' ', '_')} TEXT" for col in df.columns]) # Eg: page_id TEXT, name TEXT, urslug TEXT, ...
create_table_query = f"CREATE TABLE IF NOT EXISTS {table_name} ({columns_with_types});"

cursor.execute(create_table_query)
cursor.fetchall()

I’ve decided to name my table characters . Within the query, first we use a nice concise notation to check if the table already exists, if not create it using the CREATE TABLE IF NOT EXISTS command. Then, for each of the columns in the dataframe, we make the string “<column_name> TEXT”. We use the Python replacecommand to replace the space in the column name with an underscore. This is inline with the SQLite Syntax for creating a table:

CREATE TABLE <table_name> (
  page_id TEXT,
  name TEXT,
  urslug TEXT,
  ...);

Finally we run the command and our table must have been created. To check if it has been, run the following command:

cursor.execute('pragma table_info(characters);')
cursor.fetchall()

You will see the following

[(0, 'page_id', 'TEXT', 0, None, 0),
 (1, 'name', 'TEXT', 0, None, 0),
 (2, 'urlslug', 'TEXT', 0, None, 0),
 (3, 'ID', 'TEXT', 0, None, 0),
 (4, 'ALIGN', 'TEXT', 0, None, 0),
 (5, 'EYE', 'TEXT', 0, None, 0),
 (6, 'HAIR', 'TEXT', 0, None, 0),
 (7, 'SEX', 'TEXT', 0, None, 0),
 (8, 'GSM', 'TEXT', 0, None, 0),
 (9, 'ALIVE', 'TEXT', 0, None, 0),
 (10, 'APPEARANCES', 'TEXT', 0, None, 0),
 (11, 'FIRST_APPEARANCE', 'TEXT', 0, None, 0),
 (12, 'YEAR', 'TEXT', 0, None, 0)]

Now, that our table has been created, we can simply list all the rows and add them one by one.

for index, row in df.iterrows():
    values = ", ".join([f'"{row_item}"' for row_item in row])
    insert_sql = f"INSERT INTO {table_name} ({', '.join(df.columns.str.replace(' ', '_'))}) VALUES ({values})"
    cursor.execute(insert_sql)

We use the df.iterrows() command to go through each row. Then we make a list of all the row items, ensuring that each row item is a string that is sorrounded by quotations. Then we use the INSERT command to add all the values to the table.

To see if we’ve gotten all the rows, let’s check the number of rows in both the dataframe and the sqlite table.

df.shape # Returns (6896, 13)

cursor.execute('SELECT COUNT(*) FROM characters')
cursor.fetchall() # Returns [(6896,)]

So it seems that our code works. Finally don’t forget to commit to the connection and close it.

conn.commit()
conn.close()

Conclusion

In conclusion, converting a CSV file to an SQL database using Python and sqlite3 is a straightforward process that can be highly automated. By leveraging the pandas library for data manipulation and sqlite3 for database operations, you can efficiently transfer data between these formats. If you’re having trouble running the code, here’s the link to the IPython notebook.